full transcript
From the Ted Talk by Kenneth Cukier: Big data is better data
Unscramble the Blue Letters
Machine learning is at the basis of many of the things that we do online: seacrh engines, Amazon's personalization algorithm, computer taoarnsitln, voice recognition systems. Researchers recently have looked at the qoeuitsn of biopsies, cancerous biipseos, and they've asked the computer to identify by looking at the data and survival rates to determine whether cells are actually cancerous or not, and sure enough, when you throw the data at it, through a machine-learning algorithm, the machine was able to idenitfy the 12 telltale signs that best predict that this biopsy of the breast cancer cells are indeed cancerous. The problem: The medical literature only knew nine of them. Three of the ttaris were ones that pploee didn't need to look for, but that the machine spotted.
Open Cloze
Machine learning is at the basis of many of the things that we do online: ______ engines, Amazon's personalization algorithm, computer ___________, voice recognition systems. Researchers recently have looked at the ________ of biopsies, cancerous ________, and they've asked the computer to identify by looking at the data and survival rates to determine whether cells are actually cancerous or not, and sure enough, when you throw the data at it, through a machine-learning algorithm, the machine was able to ________ the 12 telltale signs that best predict that this biopsy of the breast cancer cells are indeed cancerous. The problem: The medical literature only knew nine of them. Three of the ______ were ones that ______ didn't need to look for, but that the machine spotted.
Solution
- question
- traits
- people
- search
- biopsies
- identify
- translation
Original Text
Machine learning is at the basis of many of the things that we do online: search engines, Amazon's personalization algorithm, computer translation, voice recognition systems. Researchers recently have looked at the question of biopsies, cancerous biopsies, and they've asked the computer to identify by looking at the data and survival rates to determine whether cells are actually cancerous or not, and sure enough, when you throw the data at it, through a machine-learning algorithm, the machine was able to identify the 12 telltale signs that best predict that this biopsy of the breast cancer cells are indeed cancerous. The problem: The medical literature only knew nine of them. Three of the traits were ones that people didn't need to look for, but that the machine spotted.
Frequently Occurring Word Combinations
ngrams of length 2
collocation |
frequency |
big data |
14 |
arthur samuel |
6 |
machine learning |
4 |
favorite pie |
2 |
supermarket sales |
2 |
smaller amounts |
2 |
term big |
2 |
small data |
2 |
national security |
2 |
security agency |
2 |
martin luther |
2 |
telltale signs |
2 |
samuel knew |
2 |
ngrams of length 3
collocation |
frequency |
term big data |
2 |
national security agency |
2 |
arthur samuel knew |
2 |
Important Words
- algorithm
- asked
- basis
- biopsies
- biopsy
- breast
- cancer
- cancerous
- cells
- computer
- data
- determine
- engines
- identify
- knew
- learning
- literature
- looked
- machine
- medical
- people
- personalization
- predict
- question
- rates
- recognition
- researchers
- search
- signs
- spotted
- survival
- systems
- telltale
- throw
- traits
- translation
- voice